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* ABSTRACT 



Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context-dependent 
work correction. In response to the first problem, efficient pattern-matching and n-gram analysis 
techniques have been developed for detecting strings that do not appear in a given word list. In 
response to the second problem, a variety of general and application-specific spelling correction 
techniques have been developed. Some of them were based on detailed studies of spelling error 
patterns. In response to the third problem, a few experiments using natural-language-processing 
tools or statistical-language models have been carried out. This article surveys documented findings 
on spelling error patterns, provides descriptions of various nonword detection and isolated-word error 
correction techniques, reviews the state of the art of context-dependent word correction techniques, 
and discusses research issues related to all three areas of automatic error correction in text. 



* REFERENCES 

Note: OCR errors may be found in this Reference List extracted from the full text article. ACM has 
opted to expose the complete List rather than only correct and linked references. 

1 ABNEY, S. 1990. Rapid incremental parsing with repair. In Proceedings of the 6th New OED 
Conference: Electronic Text Research (Waterloo, Ontario, Oct. 1990). 

2 Alfred V. Aho , Alg orithms for findin g patterns in string s . Handbook of theoretical computer science 
( vol. A ): alg orithms and complexity , MIT Press , Cambrid ge, MA, 1991 



http://portalbeta.acm.org/citation.cfm?id=146380&coll=ACM&dl=ACM& 12/30/03 



Technique for automatically correcting words in text 



Page 2 of 16 



3 Alfred V. Aho , Mar g aret J. Corasick, Efficient strin g matching: an aid to bibliographic search , 
Communications of the ACM, v. 18 n.6, p. 333-340, June 1975 

4 AHO, A. V., AND PETERSON, T.G. 1972. A minimum distance error-correcting parser for context 
free languages. SIAM J. Comput. 1, 4 (Dec), 305-312. 

5 Cyril N. Albeiga, String similarity_and misspellings, Communications of the ACM, v.l0_n.5,_p.302- 
313, May 1967 

6 Robert B. Allen , Candace A. Kamm, A recurrent neural network for word identification from 
continuous phoneme strings, Proceedings of the 1990 conference on Advances in neural information 
processing systems 3, p. 206-212, O ctob er 1 990, Denver , C olora do, Un ited States 

7 John L ,Amott_ / _ Alan, F, Newell , Norman Aim, Pr edictio n and conversati onal m omentum in an 
augmentative communication system , Communications of the ACM , v. 35 n.5 f p. 46-57, May 1992 

8 ANGELL, R. C, FREUND, G. E., AND WILLETT, P. 1983. Automatic spelling correction using a 
trigram similarity measure. Inf. Process. Manage. 19,255 261. 

9 ATWELL, E., AND ELLIOTT, S. 1987. Dealing with ill-formed English text (Chapter 10). In The 
Computational Analysis of English: A Corpus- Based Approach. R. Garside, G. Leach, G. Sampson, Ed. 
Longman, Inc. New York. 

10 BAHL, L. R., BROWN, P. F., DESOUZA, P. V., AND MERCER, R.L 1989. A tree-based statistical 
language model for natural language speech recognition. IEEE Trans. Acoust. Speech Stg. Process. 
37, 7, (July), 1001-1008. 

11 BAHL, L R., JELINEK- F., AND MERCER, R.L 1983. A maximum likelihood approach to 
continuous speech recognition. IEEE Trans. Patt. Anal. Machine Intell. PAMI-5, 2 (Mar.), 179 190. 

12 Jon_Mntlev f Programming pearls, Communi c ations of the ACM , v. 28 n .5 f p.456-462, Ma y 1985 

13 Michael Allen Bickel , Automatic co r r ection t o m is s p e ll e d names : a fou rth- g e n e ra t i on language 
approach, Communications of the ACM, v. 30 n.3, p.224-228, March 1987 

14 BLAIR, C. R. 1960. A program for correcting spelling errors. Inf. Contr. 3, 60 67. 

15 BLEDSOE, W. W., AND BROWMNG, I. 1959. Pattern recognition and reading by machine. In 
Proceedings of the Eastern Joint Computer Conference, vol. 16, 225-232. 

16 BOCAST, A. K. 1991, Method and apparatus for reconstructing a token from a Token Fragment. 
U.S. Patent Number 5,008,818, Design Services Group, Inc. McLean, Va. 

17 BOIWE, R. H. 1981. Directory assistance revisited. AT&T Bell Labs Tech. Mem. June 12, 1981. 

18 BROWN, P. F., DELLA PIETRA, V. J., DESOUZA, P. V., AND MERCER, R. L 1990a. Class-Based n- 
Gram Models of Natural Language. 

19 Peter Brown . John Cocke , Stephen Delia Pietra , Vincent J. Delia , Fredrick Jelinek , John D. 
Laffertv , Robert L. Mercer , P aul S. Roosin , A statistical a p proach to machine translation, 
Com putational Linguistics , v. 16 n.2 , p. 79-8 5, Jun. 1990 

20 BROWN, P., DELLA PIETRA, S., DELLA PIETRA, V., AND MERCER, R. 1991. Word sense 
disambigaation using statistical methods. In Proeeedtngs of the 29th Annual Meeting of the 



http://portalbeta.acm.org/citation.cfm?id=146380&coll=ACM&dl=ACM& 12/30/03 



Technique for automatically correcting words in text 



Page 3 of 16 



Association for Computational Linguistics (Berkeley, Calif., June), ACL, 264 270. 

21 BURR, D. J. 1983. Designing a handwriting reader. IEEE Trans. Patt. Anal. Machine Intell. PAMI- 
5, 5 (Sept.), 554 559. 

22 BURR, D. J. 1987. Experiments with a connactionist text reader. In IEEE International 
Conference on Neural Networks (San Diego, Calif., June). IEEE, New York, IV:717-724. 

23 CARBERRY, S. 1984. Understanding pragmatically ill-formed input. In Proceedings of the 10th 
International Conference on Computational Linguistics. ACL, 100-206. 

24 CARBONELL, J. G., AND HAYES, PJ. 1983. Recovery strategies for parsing extragrammatical 
language. Amer. J. Comput. Ltng. 9, 3-4 (July-Dec), 123 146. 

25 CARTER, D.M. 1992. Lattice-based word identification in CLARE. In Proceedings of the 30th 
Annual Meeting of the Association for Computational Linguistics (Newark, Del., June 28-July 2). ACL, 
159-166. 

26 CHERKASSKY, V., AND VASSILAS, N. 1989a. Backpropagation networks for spelling correction. 
Neural Net. 1, 3 (July), 166-173. 

27 CHERKASSKY, V., AND VASSILAS, N. 1989b. Performance of back-propagation networks for 
associative database retrieval. Int. J. Comput. Neural Net. 

28 CHERKASSKY, V., RAO, M., AND WECHSLER, H. 1990. Fault-tolerant database retrieval using 
distributed associative memories. Inf. Sci. 46, 135-168. 

29 Vladimir Cherkassky , Karen Fassett , Nikolaos Vassilas, Linear Algebra Approach to Neural 
Associative Memories and Noise performance of Neural Classifiers, IEEE Transactions on Computers, 
v.40 n.12 , p. 1 429- 14 35, December 1991 

30 CHERKASSKY, V., VASSILAS, N., BRODT, G. L, AND WECHSLER, H. 1992. Conventional and 
associative memory approaches to automatic spelling checking. Eng. Appl. Artif. Intell. 5, 3. 

31 CHERRY, L, AND MACDONALD, N. 1983. The Writer's Workbench software Byte, (Oct.), 241 
248. 

32 CHOUEKA, Y. 1988. Looking fbr needles in a haystack. In Proceedtngs of RIAO, 609 623 

33 CHURCH, K.W. 1988. A stochastic parts program and noun phrase parser for unrestricted text. 
In Proceedings of the 2nd Applted Natural Language Processing Conference (Austin, Tex, Feb.). ACL, 
136 143. 

34 CHURCH, K. W., ANO GALE, W.A. 1991a. Probability scoring for spelling correction. Stat 
Camput. 1, 93 103. 

35 CHURCH, K. W., AND GAbE, W. A. 1991b. Enhanced Good-Turmg and cat-cal Two new methods 
for esmnating probabilities of English bigrams. Comput. Speech Lung. 1991. 

36 COHEN, G. 1980. Reading and searching for spelling errors. In Cognitive Processes in Spelhng. 
Uta Frith, Ed. Academic Press, London. 

37 COtLER, C H., CHURCH, K. W., AND LIBERMAN, M. Y. 1990. Morphology and rhyming: Two 
powerful alternatives to letter-to-sound rules for speech synthesis. In Proceedings of the Conference 



http://portalbeta.acm.org/citatio^ 12/30/03 



Technique for automatically correcting words in text 



Page 4 of 16 



on Speech Synthesis. European Speech Communication Association. 

38 CONTANT, C, AND BRUNELLE, E. 1992 Exploratexte: Un analyseur a I'affut des erreurs 
grammaticales. In Actes du colloque lexiquesgrammatres compares, Universite du Quebec a 
Montreal. In French. 

39 William H. Cushman , Purnendu S. Ojha , Cathleen M. Daniels, Usable OCR: what are the 
minima requirements?, Proceedings of the SIGCHI confere n ce on Human factors in 
computing systems: Empowering people, p. 145-152 , A pril 01-05 , 1990, Seattle, Washington 7united 
States 

40 DAHL, P, AND CHERKASSKY, V. 1990. Combined encoding in associative spelling checkers. Umv. 
of Minnesota EE Dept. Tech. Rep. 

41 Fred J. Damera u, Ev aluatin g com p uter- g enerated doma in -oriented vocabularies. Information 
Processing and Management: an I nt er nationa l J our nal , v. 2 6 n.6, p. 7 9 1-801 , 1990 

42 F red J. Damerau , A technique for com p uter detection and correction of spelling errors, 
Co mmunications of the ACM , v. 7 n.3 , p. 171-176 , March 1964 

4 3 Fred J. Damerau , Eric Mays, An examination of undetected typing errors , Inform ation Processing 
and Management: an In tern ation a l Journal, v.25 n.6, p. 659-664, 1989 

44 Leon Davidson, Retrieval of miss pelled na mes in a n airlines p assen ge r recor d system, 
Communications of the ACM, v. 5 n.3, p. 169-171, M a rc h 1962 

45 DEERWESTER, S., DUMAIS, S. T., FURNAS, G. W., LANDAUER, T K., AND HARSHMAN, R. 1990. 
Indexing by Latent Semantic Analysis. JASIS 41, 6, 391-407. 

46 DEFFNER, R., EDER, K, AND GEiGER, I-I. 1990a. Word recognition as a first step towards natural 
langlmge processing with artificial neural nets. In Proceedings of KONNAI-90. 

47 DEFFNER, R., GEIGER, H., KAHLER, R., KREMPL, T., AND BRAUER, W. 1990b. Recognizing words 
with connectionist architectures. In Proceedings of INNC-90-Parts (Paris, France, July), 196. 

48 DEHEER, T. 1982. The application of the concept of homeosemy to natural language information 
retrieval. Inf. Process. Manage. 18, 229-236. 

49 DELOCttE, G., AND DEmLh F. 1980. Order information redundancy of verbal codes in French and 
English' Neurolinguistic implications. J. Verbal Learn. Verbal Behav. 19, 525-530. 

50 Patrick W. Demasco , Kathleen F. McCoy, Generating text from compressed input: an intellig ent 
interface for people with severe moto r impairments, Communicat ions of the AC M, v.35 n.5, p.68:ZQ/ 
May 1992 

51 DEROUAULT, A.-M., AND MERIALDO, B. 1984a. Language modeling at the syntactic level. In 
Proceedmgs of the 7th International Conference on Pattern Recognition (Montreal, Canada, July 30- 
Aug. 2), 1373-1375. 

52 DEROUAULT, A.-M, AND MER~ALDO, B. 1984b. TASF: A stenotypy-to-French transcription 
system. In Proceedings of the 7th International Conference on Pattern Recognition (Montreal, 
Canada, July 30-Aug. 2), 866-868. 

53 Michael R. Dunlavey . Lance A. Miller, Technical corrections: Onspellina correction and beyond, 
Communications of the ACM , v. 24 n.9, p. 608-609 , Sept. 1981 



http://portalbeta.acm.org/citation.cfm?id=146380&coll-ACM&dl=ACM& 12/30/03 



Technique for automatically correcting words in text Page 5 of 16 

54 Ivor Durham , David A. Lamb , James B. Saxe, Spelling correction in user interfaces , 



Communications of the ACM, v. 26 n,10, p. 764-773, Oct, 1983 



55 EASTMAN, C. M., AND MCLEAN, D. S. 1981. On the need for parsing ill-ibrmed input. Amer. J 
Comput. Ling. 7.4, 257. 

56 ELLIOTT, R. J. 1988. Annotating spelling list worda with affixation classes. AT & T Bell Labs Int. 
Mem. Dec. 14. 

57 ELLIS, A. W. 1979 Slips of the pen. Vis. Lang. 13, 265-282. 

58 ELLIS, A. W. 1982. Spelling and writing (and reading and speaking). In Normahty and Pathology 
m Cognttwe Functwns, A. W Elhs, Ed. Academic Press, London. 

59 FA$$, n., AND WILKS, Y. 1983. Preference semantics, fil-formedness, and metaphor Amer J. 
Comput. Ling. 9.3 4 (July-Dec), 178 189. 

60 Pam el a E Fink , Ala n W Bier mann , Th e corr ect ion of ill-form ed in put us ing h isto ry-b a sed 
ex pectation with applications to speech understanding , Computational Lin g uistics, v. 12 n.l , p. 13-36 , 
Jan/March 1986 

61 FORNEY, G. D., JR. 1973. The Viterbi algorithm. Prec. IEEE 61, 3 (Mar.), 268-278. 

62 Edward A. Fox , Oi Fan Chen , Lenwood S. Heath, A faster algorithm for constructing minimal 
perf ect hash functions , Proceedi n gs of the 15th annual in t ernational ACM SI G I R co n fere nce on 
Research and develo pment in information retrieva l, p.266-273, June 2 1 -2 4, 1992 , C o p enha gen, 
Denmark 

63 Edward Fredkin, Trie memory, Communications of the ACM, v. 3 n.9, p.490-499, Aug. 1960 

64 FROMKIN, V., ED. 1980. Errors in Linguistic Performance: Shps of the Tongue, Ear, Pen and 
Hand. Academic Press, New York, 1980. 

65 GALE, W. A., AND CHURCH, K.W. 1990. Estimation procedures for language context: Poor 
estimates are worse than none. In Proceedings of Compstat-90 (Dubrovnik, Yugoslavia). Springer- 
Verlag, New York, 69-74. 

66 Stephen I. Gallant, A practical approach for representing context and for performing word sense 
disamb igu ation usin g neural networks, Neural Computation , v. 3 n.3 , p.293-309 , Fall 1991 

67 GARRETT, M. 1982. Production of speech: Observations from normal and pathological language 
use. In Normality and Pathology ~n Cognttive Functmns, A. W. Ellis, Ed. Academic Press, London. 

68 GARSIDE, R., LEACH, G., AND SAMPSON, G. 1987. The Computatwnal Analysis of English: A 
Corpus-Based Approach. Longman, Inc., New York. 

69 GENTNER, D. R., GRUDIN, J., LAROCHELLE, S., NOR- MAN, D. A., AND RUMELHART, D. E. 1983. 
Studies of typing from the LNR typing research group. In Cognitive Aspects of Skilled Typewriting, W. 
E. Cooper, Ed. Springer- Verlag, New York. 

70 GERSHO, M., AND REITER, R. 1990. Information retrieval using self-organizing and 
heteroassociative supmwised neural networks. In Procee&ngs oflJCNN (San Diego, Calif. June). 

71 GOOD, I.J. 1953. The population frequencies of species and the estimation of population 



http://portalbeta.acm.org/citation.cfm?id-146380&coll=ACM&dl-ACM& 12/30/03 



Technique for automatically correcting words in text 



Page 6 of 16 



parameters Biometrika 40, 3 and 4 (Dec), 129-264. 

72 GORIN, R. E. 1971. SPELL: A spelling checking and correction program. Online documentation 
for the DEC- 10 computer. 

73 A. Goshtasby . R. W. Ehrich, Contextual word recognition using probabilistic relaxation labeling, 
Patt er n Recognition, v. 21 n.5, p.455-4 62, 1 988 

74 GRANGER, R.H. 1983. The NOMAD system: Expectation-based detection and correction of errors 
during understanding of syntactically and semantically ill-formed text. Amer. J. Comput. Ling. 9, 3-4 
(July-Dec), 188-196. 

75 GRUDIN, J. 1983. Error patterns in skilled and novice transcription typing. In Cognitive Aspects 
of Skilled Typewriting, W. E. Copper, Ed. Springer-Verlag, New York. 

76 GRUHIN. J. 1981. The organization of serial order in typing. Ph.D. dissertation Univ. of California, 
~an Diego. 

77 Patrick A. V . Hall , Geoff R. Dowlin g, Ap proximate Strin g Match i ng, ACM C om putin g Surveys 
(CSUR). v.12 n.4, P.381-402, Dec. 1980 

78 HANSON, S. J., AND KEGL, J. 1987. PARSNIP: A connectionist network that natural language 
grammar from exposure to natural language sentences. In Proceedings of the Cognitive Science 
Conference. 

79 HANSON, A. R., RISEMAN, E. M., AND FISHER, E., 1976. Context in word recognition. Part. 
Recog. 8, 35-45. 

80 HARMON, L D. 1972.Automatic recognition of print and script. Proc. IEEE 60, (Oct.), 1165 1176. 

81 HAWLEY, MJ. 1982. Interactive spelling correction in Unix: The METRIC Library. AT&T Bell Labs 
Tech. Mem., August 31. 

82 HEIDORN, G.E. 1982. Experience with an easily computed metric for ranking alternative parses. 
In Proceedings of the 20th Annual Meeting of the Associatzon for Computational Linguistics (Toronto, 
Canada). ACL, 82-84. 

83 HEIDORN, G. E., JENSEN, K., MILLER, L A., BYRD, R. J., AND CHODOROW, M.S. 1982. The 
EPIS- TLE text-critiquing system. IBM Syst. J. 21, 3,305-326. 

84 HENSELER, J., SCHOLTES, J. C, AND VERDOEST, C. R. J. 1987. The design of a parallel 
knowledge-based optical character recognition system. Master of Science Theses, Dept. of 
Mathematics and Informatics, Delft Univ. of Technology. 

85 HINDLE, D. 1983. User manual for Fidditch, a deterministic parser. Tech. Mere. 7590 142, Naval 
Research Lab. 

86 Ho, T. K., HULL, J. J., AND SRIHARI, S. N. 1991. Word recognition with multi-level contextual 
knowledge. In Proceedings of IDCAR-91 (St. Malo, France), 905-915. 

87 HOTOPF, N. 1980. Slips of the pen. In Cognitive Processes in Spelling, Uta Frith, Ed. Academic 
Press, London. 

88 HULL, J.J. 1987. Hypothesis testing in a computational theory of visual word recognition. In 



http://portalbeta.acm.org/citati 12/30/03 



Technique for automatically correcting words in text 



Page 7 of 16 



Proceedings of AAAI-87, 6th National Conference on Artificial Intelligence, vol. 2 (Seattle, Wash., July 
13 17). AAAI, 718 722. 

89 HULL, 3. J., AND SRIHARI, S. N. 1982. Experiments in text recognition with binary n-gram and 
Viterbi algorithms. IEEE Trans. Patt. Anal. Machine Intell. PAMI-4, 5 (Sept.), 520 530. 

90 F. Jelinek , B. Merialdo , S. Roukos , M. Strauss, A dynamic language model for speech 
reco g nition. Proceedin g s of a w orkshop on Speech and natural lan g uag e, p. 293-295, June 1991 , 
P ac i f i c Gr ov e, C alifornia , Uni te d S tat e s 

91 JENSEN, K., HEIDORN, G. E., MILLER, L. A., AND RAVIN, Y. 1983. Parse fitting and prose fixM 
ing: Getting a hold on ill-formedness. Amer. J. Comput. Ling. 9, 3-4 (July-Dec), 147 160. 

92 JOHNSTON, J. C, AND MCCLELLAND, J. L 1980. Experimental tests of a hierarchical model of 
word identification. J. Verbal Learn. Verbal Behav. 19, 503-524. 

93 JONES, M. A., STORY, G. A., AND BALLARD, B. W. 1991. Integrating multiple knowledge sources 
in a Bayesian OCR post-processor. In Proceedtngs of IDCAR-91 (St Malo, France), 925-933. 

94 JOSHI, A.K. 1985. How much context-sensitivity is necessary for characterizing structural 
descriptions-Tree Adjoining Grammars In Natural Language Processing Theoretzcal, Computatzonal 
and Pwcholog~cal Perspectives, D. Dowty, L. Karttunen, A. Zwicky, Ed. Cambridge University Press, 
New York. 

95 S. K ahan , T . Pavlidis , H . S. Bair d , On t h e recogn iti on o f printed ch a ract e rs o f any font and size, 

IEEE Transactions on Pattern Analysis and Machine Intelligence, v. 9 n.2 r p. 274-288, March 1987 

96 KASHYAP, R. L, AND OOMMEN, B. J. 1981 An effective algorithm for string correction using 
generalized edit distances. Inf Sci 23, 123-142. 

97 KASHYAP, R. L, AND OOMMEN, B.J. 1984. Spelling correction using probabilistic methods. Part 
Recog. Lett. 2, 3 (Mar.), 147 154. 

98 KEELER, J., AND RUMELHART, D.E. 1992. A selforganizing mtegreted segmentation and 
recognition neural net. In Advances ~n Neural ln/~rmation Proccsszng Systems, vol. 4. J. E. Moody, 
S. J. Hanson, R. P. Lippmann, Ed. Morgan Kaufmann, San Mateo, Calif., 496-503. 

99 KEMPEN, G., AND VOSSE, T. 1990. A languagesensitive text editor for Dutch. In Proceedings of 
the Computers and Writing 111 Conference (Edinburgh, Scotland, Apr ) 

100 KERNIGHAN, M.D. 1991. Specialized spelling correction for a TDD system AT & T Bell Labs 
Tech. Mere., August. 30. 

101 KERNIGHAN, M. D., AND GALE, W.A. 1991. Varmtions on channel-frequency spelling correction 
in Spamsh. AT&T Bell Labs Tech. Mem., September. 

102 KERNIGHAN, M. D., CHURCH, K. W., AND GALE, W. A. 1990. A spelling correction program 
based on a noisy channel model. In Proceedings of COL- ING-90, The 13th International Conference 
on Computational Linguistics, vol. 2 (Helsinki, Finland). Hans Kar]gren, Ed. 205-210. 

103 Donald E. Knuth , The art of computer pro g rammin g , volume 3: (2nd ed.) sorting and 
searching, Addison Wesley Longman Publishing Co. r Inc., Redwood City, CA, 1998 

104 T. Kohonen, Contentaddressable Memories, Springer- Verlag New York, Inc.. Secaucus, NJ f 
1987 



http://portalbeta.acm.org/ci^ 12/30/03 



Technique for automatically correcting words in text 



Page 8 of 16 



105 T, Ko ho nen , Self-o rganization a nd a ssociative memory:: 3rd„edition, Spring er-VerlagJ\lew York,. 
Inc.. New York, NY, 1989 

106 KUCERA, H., AND FRANCIS, W.N. 1967. Computational Analysis of Present-Day American 
Engltsh Brown University Press, Providence, R.I. 

107 KUKICH, K. 1988a. Variatmns on a back-propagation name recognition net. In Proceedings of 
the Advanced Technology Conference, vol 2 (May 3-5). U.S. Postal Service, Washington D.C., 722- 
735. 

108 KUKICH, K. 1988b. Back-propagation topologies for sequence generation. In Proceedings o/ the 
IEEE International Conference on Neural Networks, vol. 1 (San Diego, Calif., July 24 27). IEEE, New 
York, 301-308. 

109 KUKICH, K. 1990 A comparison of some novel and traditional lexical distance metrics for 
spelling correction. In Proceectzngs of INNC- 90-Paris (Paris, France, July), 309-313. 

110 Karen Kukich , Spellin g corre c tion f or the telecom m unications network for the de af, 
Com munications of the ACM , v.3 5 n.5, p . 80-90, M ay 1992 

111 LANDAUER, T. K, AND STREETER~ L. A. 1973. Structural differences between common and rare 
words. J. Verbal Learn. Verbal Behav. 12, 119-131. 

112 LEE, Y.-H., EVENS, M., MICHAEL, J. A., AND ROVICK, A.A. 1990. Spelling Correction for an 
intelligent tutoring system. Tech. Rep., Dept. of Computer Science, Illinois Inst, of Technology, 
Chicago 

113 TEIN, V I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Sov. 
Phys. Dokl. 10, (Feb), 707-710. 

114 AN, M. Y., AND WALKER. D.E. 1989. ACL Data Collectmn mitmtlve: First release. Fznite String 
15, 4 (Dec), 46-47. 

115 Robert A. Wagner , Ro y Lowrance, An Extension of the Strin q -to-Strin q Correction Problem , 
Journal of the ACM (JACM), v.22 n.2, p. 177-183, April 1975 

116 O., BURGES, C. J. C, LECuN, Y, AND DENKER, J.S. 1992. Multi-digit recogmtion using a space 
displacement neural network. In Advances in Neural Information Processzng Systems, vol. 4, J. E 
Moody, S. J. Hanson, R. P. Lippnmnn, Ed. Morgan Kaufmann, San Mateo, Calif, 488-495. 

117 Eric Mays , Fred J. Damerau , Robert L. Mercer, Context based s pelling correction. Information 
Processing and Management: an International Journal, v. 27 n.5, p. 517-522, 1991 

118 J. L, AND RUMELHART. D.E. 1981 An interactive activation model of context effects in letter 
perception. Psychol. Rev. 88, 5 (Sept.), 375 407. 

119 K. F. McCoy, Generating context-sensitive res po nses to obj ect-related misconce ptions, Artificial 
Intellig ence , v.41 n.2 , p. 157-195 , Dec. 1989 

120 Y, M. D 1992. Development of a spelling li~t. IEEE Trans_ Comrnun. COM-30, i (Jan.)/ 91 99. 

121 L.G. 1988. Cn yur cmputr reed ths. In Proceedinss of the 2nd Applzed Natural Language 
Processing Conference (Austin, Tex, Feb.). ACL, 93-100. 



http://portalbeta.acm.org/citation.cfm?id=146380i&coll=ACM&dl=ACM& 12/30/03 



Technique for automatically correcting words in text 



Page 9 of 16 



122 S., HAYES, P. J., AND FAIN J. 1985. Controlling search in fiemble parsing. In Proceedings of the 
Internatzonal Jmnt Conference on Artificml Intelhgence. Morgan Kaufman, San Marco, Calif., 786- 
787. 

123 Roger Mitton, Spellin g ch eckers , s pellin g correct or s and the m i s sp e ll in g s of poor spellers, 
Information Processing and Mana g ement: an International JournaL v. 23 n.5, p. 495-505, Sept. 1987 

124 R Mitton, A partial dictionary of En g lish in computer-usable form. Literary & Lin guistic 
Computing, v.l n.4, p. 214-215, 1986 

125 R. 1985. A collection of computer-readable corpora of English spelling errors. Cog. 
Neuropsychol. 2, 3,275-279. 

126 AND FRAENKEL, A. S. 1982a. Retrieval in an environment of faulty texts or faulty queries. In 
Proceedings of the 2nd International Conference on Improving Database Usability and 
Responsiveness (Jerusalem), P. Scheuerman, Ed. Academic Press, New York, 405-425. 

127 M. Mor , A. S. Fraenkel, A hash code method for detecting and correcting spelling errors, 
Communications of the ACM, v. 25 n.12, p. 93 5-93 8, D ec 1982 

128 Howard L Morgan, Spelling„correction in systems prog rams , Com munications of the ACM, v. 13 
n.2, p.90-94, Feb 1970 

129 R., AND CHERRY, L.L. 1975. Computer detection of typographical errors. IEEE Trans. Profess. 
Commun. PC-18, 1, 54-63. 

130 E., JR., AND THARP, A.L. 1977. Correcting human error in alphanumeric terminal input. Inf. 
Process. Manage. 13, 329-337. 

131 ER, G. L. 1966. Introduction to Dynamic Programming. Wiley, New York. 

132 J., PHILLIPS, V. L, AND DUMAIS, S. T. 1992. Retrieving imperfectly recognized handwritten 
notes. Behav. Inf. Teeh. 

133 M. K., AND RUSSELL, R. C. 1918. U.S. Patent Numbers, 1,261,167 (1918) and 1,435,663 
(1922). U.S. Patent Office, Washington, D.C. 

134 T., TANAKA, E., AND KASAI, T. 1976. A method of correction of garbled words based on the 
Levenshtein metric. IEEE Trans. Comput. 25, 172-177. 

135 T., MACHI, F., EVANS, B., AND TOM, J. 1988. Computational techniques for improved name 
search. In Proceedings of the 2nd Annual Applied Natural Language Conference (Austin, Tex, Feb.). 
ACL, 203-210. 

136 E, K., CHIGNELL, M., KHOSHAFIAN, S., AND WONG, H. 1990. Intelligent databases. A/ Expert, 
(Mar.), 38 47. 

137 James L. Peterson , Computer prog ra m s for detectin g and co r rectin g spell i n g errors, 
Co mmunications of the ACM , v. 23 n.12 , p.676-687 , Dec. 1980 

138 James L. Peterson , A note on undetected ty pi ng errors, Communications of the ACM , v. 29 n.7 , 
p.633-637, July 1986 

139 POLLOCK, J. J., AND ZAMORA, A. 1983. Collection and characterization of spelling errors in 



http://portalbeta.acm.org/citation.cfm?id=146380&coll=ACM&dl=ACM 12/30/03 



Technique for automatically correcting words in text 



Page 10 of 16 



scientific and scholarly text. J. Amer. Soc. Inf. Sci. 34, 1, 51 58. 

140 Jose ph J. P olloc k , Antonio Zamora, Automatic spellin g correction in scientific and scholarly 
text , Communications of the ACM, v,27 n.4 , p.358-368 , April 1984 

141 RAMSaAW, L A. 1989. Pragmatic knowledge for resolving ill-formedness. Tech. Rep. No. 89-18, 
BBN, Cambridge, Mass. 

142 RHYNE, J. R., AND WOLF, C. G. 1991. Paperlike user interfaces. RC 17271 (#76097), IBM 
Research Division, T. J. Watson Research Center, Yorktown Heights, N.Y. 

143 RHYNE, J. R., AND WOLF, C. G. 1993. Recognition-based user interfaces. In Advances m 
Human-Computer Interaction, vol. 4, H. R. Hartson and D. Hix, Ed. Ablex, Norwood, NJ. 

144 RICHARDSON, S. D., AND BRADEN-HARDER, L. C. 1988. The experience of developing a 
largerscale natural language text processing system: CRITIQUE. In Proceedings of the 2nd Annual 
Applied Natural Language Conference, (Austin, Tex. Feb.). ACL, 195-202. 

145 E. M., AND HANSON, A.R. 1974. A contextual postprocessing system for error correction using 
binary n-grams. IEEE Trans. Cornput. C-23, (May), 480-493. 

146 Alexander M. Robertson , Peter Willett , Searchin g for historical word-forms in a database of 
17th-century English text using spelling-correction methods. Proceedings of the 15th annual 
internatio nal AC M SIGIR confer en ce o n Re search and devel op ment in inform ation retrieval , p. 256- 
265 , June 21-24, 1992 , Copenhagen, Denmark 

147 ROSENFELD, A., HUMMEL, R. A., AND ZUCKER, S. W. 1976. Scene labeling by relaxation 
operations. IEEE Trans. Syst. Man Cybernet. SMC-6, 6, 420-433. 

148 RUMELHART, D. E., AND MCCLELLAND, J.L 1982. An interactive activation model of context 
effects in letter perception. Psychol. Rev. 89, 1, 60-94. 

149 D . E. R u melha r t , G. E . H i nton , R . J. W illiams , Learn ing interna l r epresentations by error 
propagation, Parallel distributed proc essing : ex p lo ra tions in the mic r ostruct ure of cognition, vol. 1: 
foundations, MIT Press, Cambridge, MA, 1986 

150 Gerard Salton, Automatic text processing: the transformation, analysis, and retrieval of 
i nfo rmation by com puter, Addison- Wesley Longman Pub lishing Co., Inc., Boston, MA , 1989 

151 SAMPSON, G. 1989. How fully does a machineusable dictionary cover English text. Lit. Ling. 
Cornput. 4, 1, 29-35. 

152 SANKOFF, D., AND KRUSKAL, J. B. 1983. Time Warps, String Edits, and Macromolecules: The 
Theory and Practice of Sequence Comparison. Addison-Wesley, Reading, Mass. 

153 SANTOS, P. J., BALTZER, A. J., BADRE, A. N., HENNE- MAN. R. L. AND MILLER. M. S. 1992. On 
handwriting recognition system performance: Some experimental results. In Proceedings of the 
Human Factors Soctety 36th Annual Meeting (Atlanta, Ga., Oct. 12-16). Human Factors Society. 

154 SCHANK, R. C, LEBOWITZ, M., AND BIRNBAUM, L. 1980. An integrated understander. Am. J. 
Cornput. Ltng. 6, 1, 13 30. 

155 B. A. Sheil, Median split trees: a fast lookup technique for frequently occurinq keys f 
Communicat ions of the ACM , v. 21 n.ll , p. 947-958, Nov. 1978 



http://portalbeta.acm.org/citation.cfm?id=146380&coll=ACM&dl=ACM& 12/30/03 



Technique for automatically correcting words in text 



Page 11 of 16 



156 SH~NOHAL, R, AND TOUSSAINT, G. T 1979a Experiments in text recognition with the modified 
Viterbi algorithm. IEEE Trans Patt. Anal. Machine Intell. PAMI-1, 4 (Apr), 184 193. 

157 SHiNGHAL, R., AND TOUSSAINT, G.T. 1979b. A bottom-up and top-down approach to using 
context in text recognition. Dzt. J. Man-Machine Stud. 11,201 212. 

158 SIDOROV, A. A. 1979. Analysis of word similarity on spelling correction systems. Program. 
Cornput. Softw 5, 274 277. 

159 R. M. K . Sinha , B. P rasada, V i s ual text recog nition through cont e xtual pro c e ss ing, Pattern 
Reco g nition, v. 21 n.5 , p.463-479, 1988 

160 SITAR, E.J. 1961. Machine recognition of cursive script: The use of context for error detection 
and correction. Bell Labs Tech. Mem. 

161 SLEATOR, D. a., AND TEMPERLY, a. 1992. ParsLng Enghsh with a Link Grammar. Source code 
via internet host: spade. pc.cs. emu. edu:/usr/ sleator/pubhc. Carnegie-Mellon Univ., Pittsburgh, Pa. 

162 SMADJA, F. 1991a From n-grams to collocations: An evaluation of XTRACT. In Proceedzngs of 
the 29th Ahnual Meetzng of the Assoczatlon for Computational Linguistics (Berkeley, Calif., June). 
ACL, 279 284. 

163 Frank Albert Smadja, Extracting collocations from text. An application: language generation, 
Colu mbia Univ er sity, New York , NY , 1992 

164 SMADJA, F., AND McKEOWN, K. 1990. Automatically extracting and representing collocations 
for language generation. In Proceedings of the 28th Annual Meeting of the Association for 
Computational LlnguLetics, (Pittsburgh, Pa., June). ACL, 252-259. 

165 SPENKE, M., BEILKEN, C, MATTERN, F., MEVENKAMP, M., AND H. M. 1984. A language 
independent error recovery method for LL(1) parsers. Softw. Pract. Exp. 14, 11. 

166 SRItiARI, S., El). 1984. Computer Text Recognitzon and Error Correctwn. IEEE Computer 
Society Press, Plscataway, N.J 

167 Sargur N. Srihari , Jonathan J. Hull, Ramesh Choudhari, Integrating di v er se knowledge sources 

in text recognition, ACM Transactions on Information Systems (TOIS), v.l n.l, p.68-87 r Jan. 1983 

168 SuRL L. Z. 1991. Language transfer: A foundation for correcting the written English of ASL 
signers. Tech. Rep. No. 91-19, Dept. of Computer and Information Sciences, Univ. of Delaware, 
Newark, Del. 

169 SuRL L. Z., AND McCoY, K. F. 1991. Language transfer in deaf writing: A correction 
methodology for an instructional system. Tech. Rep. No. 91-20, Dept. of Computer and Information 
Sciences, Univ. of Delaware, Newark, De]. 

170 TAYLOR, W D. 1981. GROPE-A spelling error correction tool. AT & T Bell Labs Tech. Mere. 

171 TENCZAR, P., AND GOLDEN, W. 1972. CERL Report X-35. Computer-Based Educatmn Research 
Lab., Umv of Ilhnois, Urbana, III. 

172 THOMPSON, B. H. 1980. Linguistic analysis of natural language communication with computers. 
In Proceedings of the 8th Internatzonal Conference on Computational Llnguistzcs (Tokyo, Japan), 190 
201. 



http://portalbeta.acm.org/c^ 12/30/03 



Technique for automatically correcting words in text 



Page 12 of 16 



173 TOUSSAINT, G T. 1978. The use of context in pat-tern recognition. Patt Recog. 10, 189 204. 

174 TR A WICK, D J. 1983. Robust sentence analysis and habitability. Ph.D dissertation, California 
Inst, of Technology, Pasadena. Calif. 

175 TROY, P. L. 1990 Combining probabilistic sources with lexical distance measures for spelhng 
correction. Bellcore Tech Memo., Bellcore, Morristown, NJ. 

176 TSAO, Y. C. 1990. A lexical study of sentences typed by hearing-impaired TDD users. In 
Proceedings of the 13th International Symposium on Human Factors in Telecommunications (Turin, 
Italy, Sept ), 197 201. 

177 Thomas IM. Turba, Checking for spelling and typographical errors in computer-based text, 
Proceedings of the A C M SIGPL A N SI G QA symposi u m on T e xt m anipula tio n, p.51-60 r J un e 08- 10, 
1 981, Po r t l a nd, O r e g on , Un i ted Sta t es 

178 ULLMANN, J.R. 1977 A binary n-gram technique for automatic correction of substitution, 
deletion, insertion and reversal errors in words. Cornput J. 20, 141-147. 

179 VAN BERKEL, B., AND DESMEI)T, K. 1988 Triphone analysis 1 A combined method for the 
correction of orthographical and typographical errors. In Proceedings of the 2nd Apphed Natural 
Language Processing Conference (Austin, Tex., Feb.). Association for Computational Linguistics 
(ACL). 

180 VERONIS, J. 1988a. Computerized correction of phonographic errors. Cornput. Hum. 22, 43-56. 

181 VERONIS, J 1988b. Morphosyntactic correction in natural language interfaces, in Proceedings of 
the 12th Iaternat~onal Conference on Computattonal Ltngu~st~cs (Budapest, Hungary), 708 713 

182 VOSSE, T. 1992. Detecting and correcting morpho-syntactic errors m real texts. In Proceedings 
of the 3rd Conference on Applied Natural Language Processing (Trento, Italy, Mar. 31 Apr.3). ACL, 
111-118. 

183 Robert A. Wa g ner, Order-n correction for re g ular lan g ua g es , Communications of the ACM, v. 17 
n.5, p.265-268. May 1974 

184 Robert A. Wagner , Michael J. Fischer, The String-to-String Correction Problem, Journal of the 
ACM (JAC M). v.21 n.l, p. 168-173 , J an . 1 974 

185 WALKE~, D. E. 1991. The ecology of language. In Proceedings of the International Workshop 
on Electronic D~ctzonarzes (Feb.). Japan Electronic Dictionary Research Institute, Tokyo, 10-22. 

186 WALKER, D. E., AND AMSLER, R.A. 1986. The use of machine-readable dictionaries in 
sublanguage analysis. In Analyzing Language ~n Restricted Domains: Sublanguage Description and 
Processing. Lawrence Erlbaum, Hillsdale, N.J., 69-83. 

187 David L. W altz, An English langua g e question answerin g s ystem for a lar g e relational database, 
Communications of the ACM , v. 21 n.7, p. 526-539 , July 1978 

188 Webster's New World Misspelled Dictionary. Simon and Schuster, New York. 

189 WEISCHEDEL, R. M., AND SONDHEIMER, N.K. 1983. Meta-rules as a basis for processing ill- 
formed input. Amer. J. Cornput. Ling. 9, 3-4 (July-Dec), 161-177. 



http://portalbeta.acm.org/citation.cfm?id=146380&coll=ACM&dl=ACM& 12/30/03 



Technique for automatically correcting words in text 



Page 13 of 16 



190 WING, A. M., AND BADDELEY, A.D. 1980. Spelling errors in handwriting: A corpus and 
distributional analysis. In Cognitive Processes in Spelhng, U. Frith, Ed. Academic Press, London. 

19 1 C. K. Wong , Ashok K. Chandra, Bounds for the String Editing Problem, Journal of the ACM 
( J ACM), v .2 3 n.l, p. 13- 16, J an . 1976 

192 WRIGHT, h. G., AND NEWELL, A. F. 1991. Computer help for poor spellers. Brit. J. Educ. Tech. 
22, 2 (Feb.), 146 148. 

193 YANNAKOUDAKIS, E. J., AND FAWTHROP, D. 1983a. An intelligent spelling correcter. Inf. 
Process. Manage. 19, 12, 101-108. 

194 YANNAKOUDAKIS, E. J., AND FAWTHROP, D. 1983b. The rules of spelling errors. Inf. Process. 
Manage. 19, 2, 87 99. 

195 YOUNG, C. W., EASTMAN, C. M., AND OAKMAN, R. L. 1991. An analysis of ill-formed input in 
natural language queries to document retrieval systems. Inf. Process. Manage. 27, 6, 615-622. 

196 ZA-IORA, E. M., POLLOCK, J. J., AND ZAMORA, A. 1981. The use of trigram analysis for 
spelling error detection. Inf. Process. Manage. 17, 6, 305-316. 

197 ZIPF, G. K. 1935. The Psycho-Biology of Language. Houghton Mifflin, Boston. 



* CITINGS 17 

Jian hua Li , Xiaolong W ang, Comb ini n g trigram a nd a utom a ti c weight distr ibut io n in Ch i nese spelling 
e rr or correction , Journal of Comput er Science and Techn o lo gy, v.17 n.6 , p.915-92 3 f November 2002 

J in Hu H u ang , Dav i d Power s, Larg e scale ex periments on correction of con f used words, Australian 
Computer Science Communications , v.23 n.l, p. 77-82 , January-February 2001 

James C. French , Allison L. Powell , Eric Schulman , Ap plications of approximate word matching in 
i nf o r m a ti oa„ re^ I , Proceed i n g s of the sixth intern a tion al conference on Information and kn owl edge 
mana g ement , p. 9-15 , November 10-14, 1997 , Las Ve gas , Nevada , United Sta tes 

R obert Garf in kel , Elena Fernandez , Ra m Go pal, Des ign o f an interactive spell checker: optimizing 
the list of offered words, Decision Support Systems, v.35 n.3 , p. 385-397 , June 2003 

Atsuhiro Takasu , Bibliog ra phic attribute extraction from erroneous references based on a statistical 
model, P r oc eed ing s of the t hir dd ACM /IEEE- C S j oint c on f er ence on D igit a l libraries , May 2 7-31, 2003 , 
Houston , Texas 

Mau ricio A . Hernandez , S alvatore J. Stolfo , T he m e r qe/pu rg e pro b I e m f o Maig e_ d ata ba ses , AC M 
SIGMOD Record , v.24 n.2 , p. 127-138, May 1995 

Sheila Te iada , Craig A. Knoblock , Steven Minton, Learnin g domain-independent strin g 
transformation weights for high accuracy object identification, Proceedings of the eighth ACM SIGKDD 
in ternatio nal con ference on Know ledg e discovery and data minin g, Jul y 23-26 , 2002 , Edmonton , 
Alberta, Canada 

Kenneth W. Church , Lisa F. Rau, Commercial a p plications of natural langua g e processin g, 
Communications of the ACM, v.38 n.ll, p. 71-79, Nov. 1995 



http://portalbeta.acm.org/citationxfm?id=146380&coll=ACM&dl=ACM&CF 12/30/03 



Technique for automatically correcting words in text Page 14 of 16 



Stefa n Ber cht o ld , Ch ristian Bohm , Bern hard Braunmuller , Da niel A. Kei m , Hans-Peter Krie gel, Fast 
pa ra llel s i mila r i ty s earch i n mu l t im edia databases , ACM S I GMOD Record , v. 26 n . 2 , p. 1-12 , June 1997 

Gonzalo N av a rr o , Ri ca rdo B ae z a - Yat e s , Joao Marc elo Az evedo Arcoverde , Matchsimi le; a flexible 
ap proximate matching tool for searchin g pro per names, Journal of the American Society for 
In formation Scien ce and T echno log y, v . 54 n.l, p. 3-15, January 2003 

Ste fan Be rcht old , Christian Bohm , Da niel A . Kei m , Ha ns- Peter Krieg el, A cos t model for nearest 
nei g hbor search i n hi g h-dimensional data space, Proce eding s of the sixteent h A CM SIGACT-SIGMOD- 
SIGART sy m posium on Principles of database systems, p. 78-8 6 , May 11-15, 1997 , Tucson, Arizona , 
Unit ed Sta tes 

B ar bara S t audt Lerne r, A m od el fo r c ompo und ty p e chan g es enc o u nt e red in schema evolut i on , ACM 
Tr a nsactions o n Database Systems (TODS), v.25 n.l, p. 83-127, Ma r c h 20 00 

Ki ng Ip Lin , H. V . Jagadi sh , Christ os F a l outso s, The TV -t r e e : a n index structu r e for high-dimensional 
data , Th e VLDB Journal — The International Journal on Very Lar g e Data Bases , v. 3 n,4, October 1994 

Jonathan D. Cohen , Recursive hashin g functions for n- q rams , ACM Transactions on Informatio n 
Systems (TOIS), v. 15 n.3, p.291-320, July 1997 

Christos Faloutsos , King-I p Lin, FastMap: a fast algorithm for indexing, data-mining and visualization 
of tr a d i tional and multim e dia datasets, A C M SIG M O D Record, v. 24 n.2, p. 163-174, M ay 1995 

G onz al o Nav arro, A g uided tour to approximate .string matching, ACMjComputing Surveys^CCSUR), 
v.33 n.l , p.31-88, M arch 200 1 

Christian Bohm , S tefan Berchtold , Daniel A. Keim, Searching in hi g h-dimensional spaces: Index 
structures for improving the performance of multimedia databases, ACM Computing Surveys (CSUR), 
v.33 n.3, p. 322-373 , S e ptember 2001 



1" INDEX TERMS 

Primary Classification: 

I- Computing Methodologies 

^ 1.2 ART I F IC I AL I NTELLIGENCE 

^ 1.2.7 Natural Language Processing 
^ Subjects: Tex t a n a l ysi s 

Additional Classification: 

I- Com puting Methodolo g ies 

^ 1,2 ARTIF ICIAL IN TEL LIGE NCE 

^ 1-2.7 Natural Langua g e Processin g 

^ Subjects: Lang ua ge models; Lan guage pars ing an d unde r standing 
^ 1.5 PATTERN RECOGNITION 
^ 1.5.1 Models 

^ Subjects: Statistical ; Neural nets 
^ 1.7 DOCUMENT AND TEXT PROCESSING 
^ 1.7.1 Document and Text Editing 



http://portalbeta.acm.org/citationxfm?id=146380&coll=ACM&dl=ACM&CFID 12/30/03 



Technique for automatically correcting words in text 



Page 15 of 16 



Subjects: S pelling ** 



General Terms: 

Alg orithms , Ex perimentation , Human Factors , Perfo r mance , Theory 



Keywords: 

n-gram ana l ysis , O ptical Character Recognition ( OCR ), cont ex t -d ependent spellin g correcti on, 
g rammar checkin g, natural-lan g u age- processing models , neural net classifiers , s pell checkin g, 
spelling error detection , spelling error patterns , statistical-language models , word recognition and 
co rrecti on 



* REVIEW 

"G raeme J, Hi rs t " 

It is often easy to tell when a poor speller or poor typist has used a spelling checker on a document: 
each word is correctly spelled, but not all are the words that the author intended. And optical 
character recognition of documents, with its occasional misrecognitions, has given the world a whole 
new source of spelling errors. Although spelling checkers (sometimes called "spell checkers" by 
people who need syntax checkers) have been available for many years now, there is still much room 
for improvement. In this paper, Kukich presents a careful and exhaustive survey of the techniques— 
many of them fascinating and ingenious— that have been developed for efficiently finding and 
correcting errors in spelling; she summarizes each method and its strengths and weaknesses. The 
problem divides into two parts: detecting an error, which might be a non-word or a wrong real word; 
and correcting such errors, either in isolation or in context. Non-word detection is the easiest form of 
the problem, and so the simplest spelling checkers are those that merely draw the user's attention to 
suspect words. The main technigues used are n-gram probabilities (for example, the trigram fkh has 
zero chance of occurring in an English word) and lexicons of correctly spelled words (which must be 
neither too big nor too small). Kukich finds the former better for detecting OCR errors, the latter 
better for human typing. To correct the possible error, once it is found, a set of candidate corrections 
must be generated and ranked. These may be presented to the user for the final judgment, or the 
substitution may be automatic. Kukich reviews a wide variety of methods— including minimum edit 
distance, similarity keys, the Viterbi algorithm, and neural nets— but finds none wholly satisfactory; 
in particular, neural nets, which might have been thought to be ideally suited to a problem of this 
kind, reguire a prohibitive amount of training. The hardest form of the problem is the detection and 
correction of erroneous real words, which generally reguires some linguistic knowledge (and, in the 
worst case, a complete understanding of the meaning of the text). For example, a parser can 
determine when a real-word error causes a syntax error in the sentence; this technigue is the basis 
for many grammar-based writer's aids, such as CRITIQUE [1]. Word bigram or trigram probabilities, 
derived from large text corpora, can improve other techniques. Because it admits so many different 
kinds of approaches, spelling checking is a problem that attracts an audience from many different 
subfields of computing. Despite much effort and many clever ideas, it remains far from solved. 
Kukich's review will become the definitive reference for work done up to this point; any computer 
scientist will enjoy reading it. Onlin e Com puting Re views Servi ce 
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To date, almost all research" work in the Content-Based Image Retrieval (CBIR) community 
has used Minkowski-like functions to measure similarity between images. In this paper, we 
first present a non-metric distance function, dynamic partial function (DPF), which works 
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This paper describes a new unified representation for the information in a video. We reduce 
the dimensionality of the signal with either a singular-value decomposition (on the semantic 
and image data) or mel-frequency cepstral coefficients (on the audio data) and then 
concatenate the vectors to form a multi-dimensional representation of the video. Using 
scale-space techniques we find large jumps in the video's path, which we call edges. We use 
these techniques to analyze the temporal properti ... 
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Most speech interfaces are based on natural language processing techniques that use pre- 
defined symbolic representations of word meanings and process only linguistic information. 
To understand and use language like their human counterparts in multimodal human- 
computer interaction, computers need to acquire spoken language and map it to other 
sensory perceptions. This paper presents a multimodal interface that learns to associate 
spoken language with perceptual features by being situated in users ... 
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We compress storage and accelerate performance of precomputed radiance transfer (PRT), 
which captures the way an object shadows, scatters, and reflects light. PRT records over 
many surface points a transfer matrix. At run-time, this matrix transforms a vector of 
spherical harmonic coefficients representing distant, low-frequency source lighting into 
exiting radiance. Per-point transfer matrices form a high-dimensional surface signal that we 
compress using clustered principal component analysi ... 
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A general-purpose computer vision system must be capable of recognizing three- 
dimensional (3-D) objects. This paper proposes a precise definition of the 3-D object 
recognition problem, discusses basic concepts associated with this problem, and reviews the 
relevant literature. Because range images (or depth maps) are often used as sensor input 
instead of intensity images, techniques for obtaining, processing, and characterizing range 
data are also surveyed. 
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This paper presents the results of more than 10 years of transdisciplinary work. The initial 
idea was: can the laws of Nature also been found of rebuilt, independently from theoretical 
research in Physics (on elementary particles and matter in general), also in the field of 
Computer Science i.e. Information Processing? Pressing a lemon reveals its juice and 
stones; if one "tortures" matter, the components of its (first electrons, neutrons and 
protons, then quarks and gluons at a ... 
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This article reviews the available methods for automated identification of objects in digital 
images. The techniques are classified into groups according to the nature of the 
computational strategy used. Four classes are proposed: (1) the simplest strategies, which 
work on data appropriate for feature vector classification, (2) methods that match models to 
symbolic data structures for situations involving reliable data and complex models, (3) 
approaches that fit models to the photometry and ... 

Keywords: image understanding, model-based vision, object recognition 
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Sylvain Petitjean 
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Full text available: |j£| pdf(3.91 MB) Additional Information: full citation , abstract , references , index terms 

In a variety of practical situations such as reverse engineering of boundary representation 
from depth maps of scanned objects, range data analysis, model-based recognition and 
algebraic surface design, there is a need to recover the shape of visible surfaces of a dense 
3D point set. In particular, it is desirable to identify and fit simple surfaces of known type 
wherever these are in reasonable agreement with the data. We are interested in the class of 
quadric surfaces, that is, algebraic surfa ... 

Keywords: Data fitting, geometry enhancement, local geometry estimation, mesh fairing, 
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Full text available: ^ pdf( 642.44 KB) Additional Information: full ci tation , abstra c t , references , index terms 

Using visualization techniques to explore and understand high-dimensional data is an 
efficient way to combine human intelligence with the immense brute force computation 
power available nowadays. Several visualization techniques have been developed to study 
the cluster structure of data, i.e., the existence of distinctive groups in the data and how 
these clusters are related to each other. However, only few of these techniques lend 
themselves to studying how this structure changes if the feature ... 

Keywords: high-dimensional data, interactive data mining 
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Pierre Baldi, Gianluca Pollastri 

September 2003 The Journal of Machine Learning Research, volume 4 
Full text available: ^pdf(231. 40 KB ) Additional information: full ci tation , abs tract 

We describe a general methodology for the design of large-scale recursive neural network 
architectures (DAG-RNNs) which comprises three fundamental steps: (1) representation of a 
given domain using suitable directed acyclic graphs (DAGs) to connect visible and hidden 
node variables; (2) parameterization of the relationship between each variable and its 
parent variables by feedforward neural networks; and (3) application of weight-sharing 
within appropriate subsets of DAG connections to capture s ... 

14 Partia l-o rd er t ra nspor t service for multimedia and othe r a ppl ica ti on s 
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G. C. Fox, W. Furmanski 

January 1988 Proceedings of the third conference on Hypercube concurrent computers 
and applications: Architecture, software, computer systems, and general 
issues - Volume 1 

Full text available- H3 pdf(4 81 MB) Additional Information: full citation , abstract, references , citings, index 
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We discuss optimal communication and decomposition algorithms for a class of regular 
problems on concurrent computers with a hypercube topology, using a general technique we 
call the method of cube geodesies. We address the calculation of various transformations 
( convolutions, functionals etc. ) of data distributed over the hypercube; examples are the 
Fast Fourier Transform, matrix algorithms, global scalar products and vector sums, sorting. 
These all involve long distance inter ... 
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September 1986 Proceedings of the 9th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: ^ pdf( 893.49 KB) Additional Information: full citation , abstract , references , citin gs 

This paper gives an overview of the graphical techniques which have been used in the 
representation of information in a document collection environment. An assessment of the 
applicability of existing multivariate data graphical techniques to the vector space model is 
presented. 

1 7 Load balancin g loose ly s ync h ronou s pr oblems with a neura l network | 
G. C. Fox, W. Furmanski 

January 1988 Proceedings of the third conference on Hypercube concurrent computers 
and applications: Architecture, software, computer systems, and general 
issues - Volume 1 

Full text available: ^ p.df(2 90 MB). Additional Information: full citation , abstract , references , citings , index 

Hopfield and Tank have introduced the use of neural networks for the solution of 
optimization problems such as the traveling salesman problem. Here we show how to 
generalize this method to decompose loosely synchronous problems onto parallel machines 
and in particular the hypercube. In this case, decomposition or load balancing can be 
formulated graph theoretically in terms of optimal partitioning of the computational graph 
into N = 2 

18 Authoritat i ve sou rc es in a h y perlinked environment | 
Jon M. Kleinberg 

September 1999 Journal of the ACM (JACM), volume 46 issue 5 

Full text available: 111 pdf(.195.41 KB) Additional Information: full citation , abstract, references, citings, index 

~ terms , review 

The network structure of a hyperlinked environment can be a rich source of information 
about the content of the environment, provided we have effective means for understanding 
it. We develop a set of algorithmic tools for extracting information from the link structures of 
such environments, and report on experiments that demonstrate their effectiveness in a 
variety of context on the World Wide Web. The central issue we address within our 
framework is the distillation of broad search topics, ... 

Keywords: World Wide Web, graph algorithms, hypertext structure, link analysis 
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20 Kernel inde pendent component analysis 
Francis R. Bach, Michael I. Jordan 

March 2003 The Journal of Machine Learning Research, volume 3 
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We present a class of algorithms for independent component analysis (ICA) which use 
contrast functions based on canonical correlations in a reproducing kernel Hilbert space. On 
the one hand, we show that our contrast functions are related to mutual information and 
have desirable mathematical properties as measures of statistical dependence. On the other 
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hand, building on recent developments in kernel methods, we show that these criteria and 
their derivatives can be computed efficiently. Minimizi ... 
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incomplete Cholesky decomposition, independent component analysis, integral equations, 
kernel methods, mutual information, semiparametric models 
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TITLE: Mask controled neural networks 

DATE-ISSUED: May 7, 1991 

INVENTOR-INFORMATION : 

NAME CITY STATE ZIP CODE COUNTRY 

White; James A. New Brighton MN 55112 

US-CL-CURRENT: 706/25; 382/157 
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L2 and probabil$6 and correlation 


7 
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Refine Search 
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Search Results - 



Terms 


Documents 


LI and metric and vector$2 and signal$2 and learn$4 and probabil$6 and correlation and 
matrix 


7 



US Pre-Grant Publication Full-Text Database 

US Patents Full-Text Database 

US OCR Full-Text Database 

EPO Abstracts Database 

JPO Abstracts Database 

Derwent World Patents Index 

IBM Technical Disclosure Bulletins 



Database: 



L4 



Search: 



|;,iepl Text 



1 r ^fine Search 



■Interrupt .■ 



Search History 



DATE: Tuesday, December 30, 2003 Printable Cop y Create Case 

IW Query J* 
side result set 

DB=PGPB,USPT,USOC,EPAB,JPAB,DWPI,TDBD; PLUR=NO; OP=OR 

^4 LI and metric and vector$2 and signal$2 and learn$4 and probabil$6 and 

correlation and matrix — 

L3 L2 and probabil$6 and correlation 7 L3 

L2 LI and metric and vector$2 and signal$2 and learn$4 32 L2 

LI 706/25. eels. 795 U 

END OF SEARCH HISTORY 
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Hit List 




Search Results - Record(s) 1 through 7 of 7 returned. 



□ 1. Document ID: US 6009418 A 

L4: Entry 1 of 7 File: USPT 



Dec 28, 1999 



US-PAT-NO: 6009418 

DOCUMENT-IDENTIFIER: US 6009418 A 

TITLE: Method and apparatus for neural networking using semantic attractor 
architecture 

DATE-ISSUED: December 28, 1999 



INVENTOR-INFORMATION : 
NAME 

Cooper; David L. 



CITY 
Fairfax 



STATE 
VA 



ZIP CODE 
22033 



COUNTRY 



US-CL-CURRENT: 706/15; 706/16, 706/25, ^06/26, 706/27 



Classification 



n 2. Document ID: US 5850470 A 

L4 : Entry 2 of 1 
US-PAT-NO: 5850470 

DOCUMENT-IDENTIFIER: US 5850470 A 



File: USPT 



Dec 15, 1998 



TITLE: Neural network for locating and recognizing a deformable object 
DATE-ISSUED: December 15, 1998 



INVENTOR-INFORMATION : 
NAME 

Kung; Sun- Yuan 
Lin; Shang-Hung 
Lin; Long-Ji 
Fang; Ming 



CITY 

Princeton 
Princeton 
Kendall Park 
Cranbury 



STATE 

NJ 

NJ 

NJ 

NJ 



ZIP CODE 



COUNTRY 



US-CL-CURRENT: 382/157; 382/116, 382/117, 382/118, 382/159, 706/20, 706/25 
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IJ 3. Document ID: US 5774632 A 

L4 : Entry 3 of 7 



File: USPT 



Jun 30, 1998 



US-PAT-NO: 5774632 

DOCUMENT-IDENTIFIER: US 5774632 A 

TITLE: Method and device for the control of an autonomously exploring robot 
DATE-ISSUED: June 30, 1998 



INVENTOR-INFORMATION : 
NAME 

Kaske; Alexander 



CITY 

50933 Koln 



STATE 



ZIP CODE 



COUNTRY 
DE 



US-CL-CURRENT: 706/25; 700/253, 706/15 



Full Title Citation Front Review Classification 



□ 4. Document ID: US 5621861 A 

L4: Entry 4 of 7 File: USPT Apr 15, 1997 

US-PAT-NO: 5621861 

DOCUMENT-IDENTIFIER: US 5621861 A 

TITLE: Method of reducing amount of data required to achieve neural network 
learning 

DATE-ISSUED: April 15, 1997 
INVENTOR-INFORMATION : 

NAME CITY STATE ZIP CODE COUNTRY 

Hayashi; Masaaki Yokohama JP 

Takahashi; Takumi Yokohama JP 



US-CL-CURRENT: 706/25; 706/20 



Classification Date Reference 



:.MflMMMtt- 



□ 5. Document ID: US 5568591 A 

L4: Entry 5 of 7 File: USPT Oct 22, 1996 

US-PAT-NO: 5568591 

DOCUMENT-IDENTIFIER: US 5568591 A 

TITLE: Method and device using a neural network for classifying data 
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DATE-ISSUED: October 22, 1996 



INVENTOR-INFORMATION: 
NAME 

Minot; Joel 
Gentric; Philippe 



CITY 

Charenton 
Paris 



STATE 



ZIP CODE 



COUNTRY 

FR 

FR 



US-CL-CURRENT: 706/25; 706/20 



ilJ.llMlll.llLUMlJU.il 



Reference 



n 6. Document ID: US 5455892 A 

L4: Entry 6 of 7 



File: USPT 



Oct 3, 1995 



US-PAT-NO: 5455892 

DOCUMENT-IDENTIFIER: US 5455892 A 

TITLE: Method for training a neural network for classifying an unknown signal with 
respect to known signals 

DATE-ISSUED: October 3, 1995 



INVENTOR-INFORMATION: 
NAME 

Minot; Joel 
Gentric; Philippe 



CITY 

Charenton 
Paris 



STATE 



ZIP CODE 



COUNTRY 

FR 

FR 



US-CL-CURRENT: 706/25; 706/20 



Classification 



Reference 



□ 7. Document ID: US 5014219 A 

L4: Entry 7 of 7 File: USPT May 7, 1991 

US-PAT-NO: 5014219 

DOCUMENT-IDENTIFIER: US 5014219 A 

** See image for Certificate of Correction ** 

TITLE: Mask controled neural networks 

DATE-ISSUED: May 7, 1991 

INVENTOR-INFORMATION: 

NAME CITY STATE ZIP CODE COUNTRY 

White; James A. New Brighton MN 55112 

US-CL-CURRENT: 706/25; 382/157 
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Terms 



Documents 



LI and metric and vector$2 and signal$2 and learn$4 and probabil$6 
and correlation and matrix 
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12/30/03 



WEST Refine Search 



Page 1 of 1 



Refine Search 



Search Results - 



Terms 


Documents 


L5 and keyword 


29 



US Pre-Grant Publication Full-Text Database 

US Patents Full-Text Database 

US OCR Full-Text Database 

EPO Abstracts Database 

JPO Abstracts Database 

Derwent World Patents Index 

IBM Technical Disclosure Bulletins 



Database: 



L6 



Search: 



1 1 Refine.Searohj 



• Recall Text - 



Search History 



DATE: Tuesday, December 30, 2003 Printable Cop y Create Case 

J 5 ** n Hit Set 

SX* Name 

y result set 

side 

DB=PGPB,USPT,USOC,EPAB,JPAB,DWPI,TDBD; PLUR=NO; OP=OR 



L6 


L5 and keyword 


29 


L6 


L5 


metric and vector$2 and signal$2 and learn$4 and probabil$6 and 


240 


L5 


correlation and matrix 


L4 


LI and metric and vector$2 and signal$2 and learn$4 and probabil$6 and 


7 


L4 


correlation and matrix 




L3 


L2 and probabil$6 and correlation 


7 


L3 


L2 


LI and metric and vector$2 and signal$2 and learn$4 


32 


L2 


Li 


706/25. eels. 


795 


Li 



END OF SEARCH HISTORY 
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Hit List 




Search Results - Record(s) 1 through 29 of 29 returned. 



Nov 20, 2003 



□ 1. Document ID: US 20030217047 Al 

L6: Entry 1 of 29 File: PGPB 

PGPUB- DOCUMENT-NUMBER : 20030217047 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER : US 20030217047 Al 

TITLE: Inverse inference engine for high performance web search 
PUBLICATION-DATE: November 20, 2003 



INVENTOR-INFORMATION: 
NAME 

Marchisio, Giovanni B . 
US-CL-CURRENT: 707/3 



CITY 

Kirkland 



STATE 
WA 



COUNTRY 
US 



RULE- 4 7 



Classification 



Reference | Sequences | Attachments Claims] kmc 



□ 2. Document ID: US 20030182246 Al 

L6: Entry 2 of 29 File: PGPB 

PGPUB- DOCUMENT-NUMBER: 2003018224 6 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20030182246 Al 

TITLE: Applications of fractal and/or chaotic techniques 
PUBLICATION-DATE: September 25, 2003 



Sep 25, 2003 



INVENTOR-INFORMATION: 
NAME 

Johnson, William Nevil Heaton 
Blackledge, Jonathan Michael 
Murray, Bruce Lawrence John 



CITY 

St. Peter Port 

Leicester 

Barcombe 



STATE COUNTRY RULE-4 7 
GB 
GB 
GB 



US-CL-CURRENT: 705/76; 380/201, 380/278 



Sequences Attachments 
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□ -3. Document ID: US 20030172043 Al 

L6: Entry 3 of 29 File: PGPB Sep 11, 2003 

PGPUB-DOCUMENT-NUMBER: 2003017204 3 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20030172043 Al 

TITLE: Methods of identifying patterns in biological systems and uses thereof 

PUBLICATION-DATE: September 11, 2003 

INVENTOR-INFORMATION: 

NAME CITY 

Guyon, Isabelle Berkeley 

Weston, Jason St. Leonard's on Sea 

US-CL-CURRENT: 706/48 



Full I Title | Citation | Front | Review | Classification | Date | Reference Sequences! Attachments I Claims] K\MC I Draou D 



STATE COUNTRY RULE-4 7 
CA US 
GB 



IJ 4. Document ID: US 20030037041 Al 

L6: Entry 4 of 29 File: PGPB Feb 20, 2003 

PGPUB-DOCUMENT-NUMBER: 20030037041 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20030037041 Al 

TITLE: System for automatic determination of customized prices and promotions 

PUBLICATION-DATE: February 20, 2003 

INVENTOR-INFORMATION: 
NAME 

Hertz, Frederick S. M 
US-CL-CURRENT: 707/1 



Full | Title I Citation | Front | Review | Classification | Date | Reference Sequences I Attachments I Claims] KMC | Drawn D 



CITY STATE COUNTRY RULE-47 

Davis WV US 



[J 5. Document ID: US 20020174120 A 1 

L6: Entry 5 of 29 File: PGPB Nov 21, 2002 

PGPUB-DOCUMENT-NUMBER : 20020174120 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20020174120 Al 
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TITLE: Relevance maximizing, iteration minimizing, relevance-feedback, content- 
based image retrieval (CBIR) 

PUBLICATION-DATE: November 21, 2002 



INVENTOR- INFORMATION : 
NAME 

Zhang, Hong-Jiang 
Su, Zhong 
Zhu, Xingquan 



CITY 
Bei j ing 
Bei j ing 
Shanghai 



STATE 



COUNTRY 

CN 

CN 

CN 



RULE-47 



US-CL-CURRENT: 707/7 



Class rftcation 



sequences \ 



Dhments Claims! Ki 



□ 6. Document ID: US 20020156763 Al 

L6: Entry 6 of 29 File: PGPB Oct 24, 2002 

PGPUB- DOCUMENT-NUMBER: 200201567 63 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20020156763 Al 

TITLE: Extended functionality for an inverse inference engine based web search 

PUBLICATION-DATE: October 24, 2002 

INVENTOR-INFORMATION: 

NAME CITY STATE COUNTRY RULE-4 7 

Marchisio, Giovanni B. Kirkland WA US 



US-CL-CURRENT : 707/1 



I IJ'.'ri'llH^H.^r'llllrllJIIIIIiMI 



Sequences I Attachments 



□ 7. Document ID: US 20020116196 Al 

L6: Entry 7 of 29 File: PGPB Aug 22, 2002 

PG PUB- DOCUMENT-NUMBER : 20020116196 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20020116196 Al 
TITLE: Speech recognizer 
PUBLICATION-DATE: August 22, 2002 
INVENTOR- IN FORMAT I ON : 

NAME CITY STATE COUNTRY RULE-47 

Tran, Bao Q. San Jose CA US 
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US-CL-CURRENT : 704/270 



Classification 



Sequences Attachments 



□ 8. Document ID: US 20020099676 Al 

L6: Entry 8 of 29 File: PGPB Jul 25, 2002 

PGPUB- DOCUMENT-NUMBER: 2002009967 6 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20020099676 Al 

TITLE: Method for filtering information including information data and keyword 
attached thereto 

PUBLICATION-DATE: July 25, 2002 
INVENTOR- INFORMATION : 

NAME CITY STATE COUNTRY RULE-4 7 

Kindo, Toshiki Yokohama JP 



US-CL-CURRENT : 706/16 



Full | Title | Citation | Front | Review | Classification [ Date | Reference | Sequences I Attachments |ciaims| KVulC | Dravu D 



□ 9. Document ID: US 20020069218 Al 

L6: Entry 9 of 29 File: PGPB Jun 6, 2002 

PGPUB-DOCUMENT-NUMBER: 20020069218 
PGPUB-FILING-TYPE: new 

DOCUMENT-IDENTIFIER: US 20020069218 Al 

TITLE: System and method for indexing, searching, identifying, and editing portions 
of electronic multimedia files 



PUBLICATION-DATE: June 6, 2002 



INVENTOR-INFORMATION: 








NAME 


CITY 


STATE 


COUNTRY 


Sull, Sanghoon 


Seoul 




KR 


Kim, Hyeokman 


Seoul 




KR 


Choi, Hyungseok 


Seoul 




KR 


Chung, Min Gyo 


Sungnam City 




KR 


Yoon, Ja-Cheon 


Seoul 




KR 


Oh, Jeongtaek 


Seoul 




KR 


Lee, Sangwook 


Seoul 




KR 


Song, S. Moon-Ho 


Seoul 




KR 


Kim, Jung Rim 


Seoul 




KR 


Lee, Keansub 


Suwon City 




KR 



RULE-47 
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Chun, Seong Soo 
Oh, Sangwook 
Kim, Yunam 



Songnam City 
Cheju City 
Cheju City 



KR 



KR 



KR 



US-CL-CURRENT: 715/501.1 




□ 10. Document ID: US 20010047345 Al 



L6: Entry 10 of 29 



File: PGPB 



Nov 29, 2001 



PGPUB- DOCUMENT-NUMBER : 20010047345 
PGPUB-FILING-TYPE: new 

DOCUMENT- IDENTIFIER : US 20010047345 Al 

TITLE: Information filtering method and apparatus for preferentially taking out 
information having a high necessity 

PUBLICATION-DATE: November 29, 2001 

INVENTOR-INFORMATION: 

NAME CITY STATE COUNTRY RULE-4 7 

Kindo, Toshiki Yokohama JP 

US-CL-CURRENT: 706/12 



US-PAT-NO: 6647378 

DOCUMENT-IDENTIFIER: US 6647378 B2 

TITLE: Information filtering method and apparatus for preferentially taking out 
information having a high necessity 

DATE-ISSUED: November 11, 2003 

INVENTOR- IN FORMAT I ON: 

NAME CITY STATE ZIP CODE COUNTRY 

Kindo; Toshiki Yokohama JP 

US-CL-CURRENT: 706/21; 706/20 




□ 11. Document ID: US 6647378 B2 

L6: Entry 11 of 29 



File: USPT 



Nov 11, 2003 
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□ 12. Document ID: US 6633878 Bl 

L6: Entry 12 of 29 File: USPT 

US-PAT-NO : 6633878 

DOCUMENT- IDENTIFIER: US 6633878 Bl 

TITLE: Initializing an ecommerce database framework 
DATE-ISSUED: October 14, 2003 



Oct 14, 2003 



INVENTOR-INFORMATION : 
NAME 

Underwood; Roy Aaron 



CITY 

Long Grove 



STATE 
IL 



ZIP CODE 



COUNTRY 



US-CL-CURRENT: 707/100; 707/1, 707/102, 707/205 



Classification 



□ 13. Document ID: US 6609128 Bl 

L6: Entry 13 of 29 
US-PAT-NO: 6609128 

DOCUMENT-IDENTIFIER: US 6609128 Bl 



File: USPT 



Aug 19, 2003 



TITLE: Codes table framework design in an E-commerce architecture 
DATE-ISSUED: August 19, 2003 



INVENTOR-INFORMATION : 
NAME 

Underwood; Roy Aaron 
US-CL-CURRENT: 707/10; 707/200 



CITY 

Long Grove 



STATE ZIP CODE 
IL 



COUNTRY 



n 14. Document ID: US 6601233 Bl 

L6: Entry 14 of 29 File: USPT Jul 29, 2003 

US-PAT-NO: 6601233 

DOCUMENT-IDENTIFIER: US 6601233 Bl 
TITLE: Business components framework 
DATE-ISSUED: July 29, 2003 
INVENTOR- IN FORMAT I ON : 

NAME CITY STATE ZIP CODE COUNTRY 
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Underwood; Roy Aaron 



Long Grove 



IL 



US-CL-CURRENT: 717/102; 717/100, 717/101, 717/103, 717/104, 717/106, 717/107 



Full Title Citation Front Review Classification 



File: USPT 



□ 15. Document ID: US 6523027 Bl 

L6: Entry 15 of 29 
US-PAT-NO : 6523027 

DOCUMENT-IDENTIFIER: US 6523027 Bl 



TITLE: Interfacing servers in a Java based e-commerce architecture 
DATE-ISSUED: February 18, 2003 



Feb 18, 2003 



INVENTOR-INFORMATION : 
NAME 

Underwood; Roy Aaron 



CITY 

Long Grove 



STATE 
IL 



ZIP CODE 



COUNTRY 



US-CL-CURRENT: 707/4; 707/10, 707/100 



Full Title Citation Front 



INVENTOR-INFORMATION : 
NAME 

Marchisio; Giovanni B. 
US-CL-CURRENT: 704/9; 707/3 



CITY 

Kirkland 



STATE 
WA 



ZIP CODE 



Jan 21, 2003 



□ 16. Document ID: US 6510406 Bl 

L6: Entry 16 of 29 File: USPT 

US-PAT-NO: 6510406 

DOCUMENT-IDENTIFIER: US 6510406 Bl 

TITLE: Inverse inference engine for high performance web search 
DATE-ISSUED: January 21, 2003 



COUNTRY 



Classification 



Reference 



□ 17. Document ID: US 6460036 Bl 

L6: Entry 17 of 29 
US-PAT-NO: 64 60036 



File: USPT 



Oct 1, 2002 
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DOCUMENT- IDENTIFIER: US 64 60036 Bl 

TITLE: System and method for providing customized electronic newspapers and target 
advertisements 

DATE-ISSUED: October 1, 2002 
INVENTOR- IN FORMAT I ON : 

NAME CITY STATE ZIP CODE COUNTRY 

Herz; Frederick S. M. Davis WV 

US-CL-CURRENT: 707/10; 705/14, 707/2, 709/217, 725/14 



Review Classification Date Reference 



Jan 22, 2002 



□ 18. Document ID: US 6341372 Bl 

L6: Entry 18 of 29 File: USPT 

US-PAT-NO: 6341372 

DOCUMENT-IDENTIFIER: US 6341372 Bl 

TITLE: Universal machine translator of arbitrary languages 
DATE-ISSUED: January 22, 2002 



INVENTOR- IN FORMAT I ON : 
NAME 

Datig; William E. 



CITY 

Centerport 



STATE 
NY 



ZIP CODE 
11721 



COUNTRY 



US-CL-CURRENT: 717/136; 715/523 



Jan 1, 2002 



□ 19. Document ID: US 6336108 Bl 

L6: Entry 19 of 29 File: USPT 

US-PAT-NO: 6336108 

DOCUMENT-IDENTIFIER: US 6336108 Bl 

TITLE: Speech recognition with mixtures of bayesian networks 
DATE-ISSUED: January 1, 2002 



INVENTOR- INFORMATION: 
NAME 

Thiesson; Bo 

Meek; Christopher A. 

Chickering; David Maxwell 



CITY 

Woodinville 

Kirkland 

Redmond 



STATE 
WA 
WA 
WA 



ZIP CODE COUNTRY 
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Heckerman; David Earl 
Alleva; Fileno A. 
Hwang; Mei-Yuh 



Bellevue 

Redmond 

Redmond 



WA 
WA 
WA 



US-CL-CURRENT: 706/20; 704/256 



IliflPlI? 



\\m\mi 



□ 20. Document ID: US 6327583 Bl 

L6: Entry 20 of 29 



File: USPT 



Dec 4, 2001 



US-PAT-NO: 6327583 

DOCUMENT-IDENTIFIER: US 6327583 Bl 

TITLE: Information filtering method and apparatus for preferentially taking out 
information having a high necessity 

DATE-ISSUED: December 4, 2001 



INVENTOR-INFORMATION : 

NAME CITY 

Kindo; Toshiki Yokohama 



STATE 



ZIP CODE 



COUNTRY 
JP 



US-CL-CURRENT: 706/45; 706/16, 706/21 



Classification 







Cia ims 


KMC 


Draw. D 



□ 21. Document ID: US 6286012 Bl 

L6: Entry 21 of 29 



File: USPT 



Sep 4, 2001 



US-PAT-NO: 6286012 
DOC UMENT - 1 DENT I FI ER : 



US 6286012 Bl 



TITLE: Information filtering apparatus and information filtering method 
DATE-ISSUED: September 4, 2001 



INVENTOR-INFORMATION : 
NAME 

Kindo; Toshiki 
Yoshida; Hideyuki 
Watanabe; Taisuke 



CITY 

Yokohama 

Sagamihara 

Sagamihara 



STATE ZIP CODE 



US-CL-CURRENT: 707 / 104 . 1 ; 707/9, 709/218, 709/222 



COUNTRY 

JP 

JP 

JP 



Review Classification 
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□ 22. Document ID: US 6233545 Bl 

L6: Entry 22 of 29 



File: USPT 



May 15, 2001 



US-PAT-NO: 6233545 

DOCUMENT-IDENTIFIER: US 6233545 Bl 

TITLE: Universal machine translator of arbitrary languages utilizing epistemic 
moments 

DATE-ISSUED: May 15, 2001 



INVENTOR-INFORMATION : 
NAME 

Datig; William E. 



CITY 

Centerport 



STATE 
NY 



ZIP CODE 
11721 



COUNTRY 



US-CL-CURRENT: 704/2; 7_04/9, 706 /62 



Classification 



n 23. Document ID: US 6076082 A 

L6: Entry 23 of 29 



File: USPT 



Jun 13, 2000 



US-PAT-NO: 6076082 

DOCUMENT-IDENTIFIER: US 6076082 A 

TITLE: Information filtering method and apparatus for preferentially taking out 
information having a high necessity 

DATE-ISSUED: June 13, 2000 



INVENTOR-INFORMATION : 

NAME CITY 

Kindo; Toshiki Yokohama 



STATE 



ZIP CODE 



COUNTRY 
JP 



US-CL-CURRENT: 706/12; 706/14, 707/6, 707/7 



IJ 24. Document ID: US 6070140 A 

L6: Entry 24 of 29 File: USPT May 30, 2000 

US-PAT-NO: 6070140 

DOCUMENT-IDENTIFIER: US 6070140 A 
TITLE: Speech recognizer 
DATE-ISSUED: May 30, 2000 
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INVENTOR- INFORMATION : 

NAME CITY 

Tran; Bao Q. Houston 



STATE 
TX 



ZIP CODE 
77099 



COUNTRY 



US-CL-CURRENT: 704/275; 704/232 



Full Title Citation 



□ 25. Document ID: US 6029195 A 

L6: Entry 25 of 29 File: USPT Feb 22, 2000 

US-PAT-NO: 6029195 

DOCUMENT-IDENTIFIER: US 6029195 A 

TITLE: System for customized electronic identification of desirable objects 
DATE-ISSUED: February 22, 2000 
INVENTOR-INFORMATION : 

NAME . CITY STATE ZIP CODE COUNTRY 

Herz; Frederick S. M. Davis WV 26260 



US-CL-CURRENT: 725/116; 707/10, 725/93 



Classification 



□ 26. Document ID: US 5835087 A 

L6: Entry 26 of 29 



File: USPT 



Nov 10, 1998 



US-PAT-NO: 5835087 

DOCUMENT-IDENTIFIER: US 5835087 A 

TITLE: System for generation of object profiles for a system for customized 
electronic identification of desirable objects 

DATE-ISSUED: November 10, 1998 



INVENTOR-INFORMATION : 
NAME 

Herz; Frederick S. M. 
Eisner; Jason M. 
Ungar; Lyle H. 



CITY 
Davis 

Philadelphia 
Philadelphia 



STATE ZIP CODE 

WV 26260 

PA 19107 

PA 19103 



COUNTRY 



US-CL-CURRENT: 345/810; 725/14, 725/35, 725/46 



Classification 
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□ 27. Document ID: US 5754939 A 

L6: Entry 27 of 29 



File: USPT 



May 19, 1998 



US-PAT-NO: 5754939 

DOCUMENT- IDENTIFIER: US 5754939 A 

TITLE: System for generation of user profiles for a system for customized 
electronic identification of desirable objects 
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the natural gradient in the Riemannian structur ... 



2 Surveillance: Multi-camera spatio-temporal fusion and biased sequence-data learnin g Q 
for securi t y su rv eil lanc e 

Gang Wu, Yi Wu, Long Jiao, Yuan-Fang Wang, Edward Y. Chang 
November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Full text available: ^ pdf( 2 81.7 2 KB ) Additional Information: full c itati on , a bstract , re feren ce s, i ndex ter m s 

We present a framework for multi-camera video surveillance. The framework consists of 
three phases: detection, representation, and recognition. The detection phase handles 
multi-source spatio-temporal data fusion for efficiently and reliably extracting motion 
trajectories from video. The representation phase summarizes raw trajectory data to 
construct hierarchical, invariant, and content-rich descriptions of the motion events. Finally, 
the recognition ph ... 



A theory for memory -based learning 
Jyh-Han Lin, Jeffrey Scott Vitter 

July 1992 Proceedings of the fifth annual workshop on Computational learning theory 

Full text available- fjQ pdf(1 24 MB) Additional Information: full citation, abstract , references , citings, index 
l£j ■ terms 

A memory-based learning system is an extended memory management system that 
decomposes the input space either statically or dynamically into subregions for the purpose 
of storing and retrieving functional information. The main generalization techniques 
employed by memory-based learning systems are the nearest-neighbor search, space 
decomposition techniques, and clustering. Research on memory-based learning is still in its 
early stage. In particular, there are very few rigorous theoretical r ... 
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4 Session 9: imag e in dex ing and retrieval: DynDex: a dynamic and non-metric space 
indexer 

King-Shy Goh, Beitao Li, Edward Chang 

December 2002 Proceedings of the tenth ACM international conference on Multimedia 

Full text available: ^| pdf(648.47 KB ) Additional Information: full citation , abstract , references , citin gs 

To date, almost all research work in the Content-Based Image Retrieval (CBIR) community 
has used Minkowski-like functions to measure similarity between images. In this paper, we 
first present a non-metric distance function, dynamic partial function (DPF), which works 
significantly better than Minkowski-like functions for measuring perceptual similarity; and 
we explain DPFs link to similarity theories in cognitive science. We then propose DynDex, an 
indexing method that deals with both the dynam ... 

Keywords: high-dimensional index, non-metric distance function, similarity search 



5 Supervised adaptive resonance networks 
R. S. Baxter 

May 1991 Proceedings of the conference on Analysis of neural network applications 

Full text available: ^ pdf(1.44 MB) Additional Information: full citation , references , index terms 



Cl u stered pri n cipal components for pre co m pu ted radiance tra ns fer 

Peter-Pike Sloan, Jesse Hall, John Hart, John Snyder 

July 2003 ACM Transactions on Graphics (TOG), volume 22 issue 3 

Full text available: ^f?| pdf( 9.29 M B) Additional Information: ful l citatio n, abstract , r eferenc es 

We compress storage and accelerate performance of precomputed radiance transfer (PRT), 
which captures the way an object shadows, scatters, and reflects light. PRT records over 
many surface points a transfer matrix. At run-time, this matrix transforms a vector of 
spherical harmonic coefficients representing distant, low-frequency source lighting into 
exiting radiance. Per-point transfer matrices form a high-dimensional surface signal that we 
compress using clustered principal component analyst ... 

Keywords: graphics hardware, illumination, monte carlo techniques, rendering, shadow 
algorithms 



7 Finknn: a f u z z y interva l n um be r k- n e arest ne ig hbor c l assi fi e r for p red icti on of su gar 
production from populations of samples 
Vassilios Petridis, Vassilis G. Kaburlasos 

September 2003 The Journal of Machine Learning Research, volume 4 
Full text available: ^ pdf(360.76 KB) Additional Information: full citation , abstract 

This work introduces FINkNN, a k-nearest-neighbor classifier operating over the metric 
lattice of conventional interval-supported convex fuzzy sets. We show that for problems 
involving populations of measurements, data can be represented by fuzzy interval numbers 
(FINs) and we present an algorithm for constructing FINs from such populations. We then 
present a lattice-theoretic metric distance between FINs with arbitrary-shaped membership 
functions, which forms the basis for FINkNN 1 ... 
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Multimedia communications, relevance feedback and indexing: Kernel VA-files for 
r elevance feedback retrieva 
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Douglas R. Heisterkamp, Jing Peng 

November 2003 Proceedings f the first ACM international workshop n Multimedia 
databases 

Full text available: ^ pdf(453.37 KB) Additional Information: full citation , abstract , references , index terms 

Many data partitioning index methods perform poorly in high dimensional space and do not 
support relevance feedback retrieval. The vector approximation file (VA-File) approach 
overcomes some of the difficulties of high dimensional vector spaces, but cannot be applied 
to relevance feedback retrieval using kernel distances in the data measurement space. This 
paper introduces a novel KVA-File (kernel VA-File) that extends VA-File to kernel-based 
retrieval methods. A key observation is that kernel d ... 

Keywords: VA-Files flexible metrics, content-based image retrieval, indexing, kernel 
methods, relevance feedback 



An obs e r v ability- ba sed c ode c overag e metric for fu n cti o na l simulat i on 
Srinivas Devadas, Abhijit Ghosh, Kurt Keutzer 

January 1997 Proceedings of the 1996 IEEE/ACM international conference on 
Computer-aided design 

Full text available: g.|5df(124,26_KB} Additional Information: full citation, abst r a ct, r e f e re n ces, c i t i n gs, i ndex 

f l Pub lisher S i t e terms 

Functional simulation is the most widely used method for design verification. At various 
levels of abstraction, e.g., behavioral, register-transfer level and gate level, the designer 
simulates the design using a large number of vectors attempting to debug and verify the 
design. A major problem with functional simulation is the lack of good metrics and tools to 
evaluate the quality of a set of functional vectors. Metrics used currently are based on 
instruction counts and are quite simplistic. Des ... 

Keywords: verification, functional simulation, code coverage 



10 On the influence of the kernel on the consistency of su p port vector machines 
Ingo Steinwart 

March 2002 The Journal of Machine Learning Research, volume 2 

Full text available: ®j?df(343.65 KB) Additional Information: full citation , abstract, references , clings, index 
i*=» - - - - - terms 

In this article we study the generalization abilities of several classifiers of support vector 
machine (SVM) type using a certain class of kernels that we call universal. It is shown that 
the soft margin algorithms with universal kernels are consistent for a large class of 
classification problems including some kind of noisy tasks provided that the regularization 
parameter is chosen well. In particular we derive a simple sufficient condition for this 
parameter in the case of Gaussian RBF kernels ... 

Keywords: PAC model, computational learning theory, kernel methods, pattern recognition, 
support vector machines 



A survey on wavelet applications in data minin g 
Tao Li, Qi Li, Shenghuo Zhu, Mitsunori Ogihara 

December 2002 ACM SIGKDD Explorati ns Newsletter, volume 4 issue 2 

Full text available: ^ pdf( 330.06 KB ) Additional information: full citation , abstract , references 

Recently there has been significant development in the use of wavelet methods in various 
data mining processes. However, there has been written no comprehensive survey available 
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on the topic. The goal of this is paper to fill the void. First, the paper presents a high-level 
data-mining framework that reduces the overall process into smaller components. Then 
applications of wavelets for each component are reviewd. The paper concludes by discussing 
the impact of wavelets on data mining research an ... 



12 S pec ia l i ssu e on s p ecial feature: An e x tensive empiric al st ud y of featur e selection 
metrics for text classification 

George Forman 

March 2003 The Journal of Machine Learning Research, volume 3 
Full text available: ^ pdf(270.38 KB) Additional Information: full citation , abstract 

Machine learning for text classification is the cornerstone of document categorization, news 
filtering, document routing, and personalization. In text domains, effective feature selection 
is essential to make the learning task efficient and more accurate. This paper presents an 
empirical comparison of twelve feature selection methods (e.g. Information Gain) evaluated 
on a benchmark of 229 text classification problem instances that were gathered from 
Reuters, TREC, OHSUMED, etc. The results are a ... 

1 3 Surve y articles: Data mi nin g for hyp ertext: a tutor ial survey | 
Soumen Chakrabarti 

January 2000 ACM SIGKDD Explorations Newsletter volume l issue 2 

Full text available: ^pdf(1,19 MB ) Additional Information: full cit at i o n, abstract, r e ferenc es , cit ings 

With over 800 million pages covering most areas of human endeavor, the World-wide Web is 
a fertile ground for data mining research to make a difference to the effectiveness of 
information search. Today, Web surfers access the Web through two dominant interfaces: 
clicking on hyperlinks and searching via keyword queries. This process is often tentative and 
unsatisfactory. Better support is needed for expressing one's information need and dealing 
with a search result in more structured ways than av ... 



14 Session 9: image indexing and retrieval: An effective region-based image retrieval 
framework 

Feng Jing, Mingjing Li, Hong-Jiang Zhang, Bo Zhang 

December 2002 Proceedings of the tenth ACM international conference on Multimedia 

Full text available: ^|pdf( 2 1 6 . 67 KB ) Additional Information: full c itat i on, abst r a ct , r e f e ren ces 

We present a region-based image retrieval framework that integrates efficient region-based 
representation in terms of storage and retrieval and effective on-line learning capability. The 
framework consists of methods for image segmentation and grouping, indexing using 
modified inverted file, relevance feedback, and continuous learning. By exploiting a vector 
quantization method, a compact region-based image representation is achieved. Based on 
this representation, an indexing scheme similar to t ... 

Keywords: continuous learning, inverted file, region-based image retrieval, relevance 
feedback 



Intelligent signal analysis and recognition using a self-organizing database 
R. Levinson, D. Helman, E. Oswalt 

June 1988 Proceedings of the first internati nal conference on Industrial and 

engineering applications of artificial intelligence and expert systems - 
V Iume2 

Full text available: ^ pdf( 1.20 MB ) Additional Information: full citation, references, index terms 
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1 6 Tec hniqu e for aut omatica lly correcting wor ds in te xt 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), volume 24 issue 4 

Full text available' pdf (6 23 MB ) Additional Information: full citation , abstr act, r eferenc es, citings, ind ex 

terms, r eview 

Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and 
n-gram analysis techniques have been developed for detecting strings that do not appear in 
a given word list. In response to the second problem, a variety of general and application- 
specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 



1 7 Sear chi ng in metric s p a c e s 

Edgar Chavez, Gonzalo Navarro, Ricardo Baeza-Yates, Jose Luis Marroqum 
September 2001 ACM Computing Surveys (CSUR), volume 33 issue 3 

Full text available* t£) pdf( 916 04 KB) Additional Information: full cit ation , abstract, references , citings, inde x 

terrns 

The problem of searching the elements of a set that are close to a given query element 
under some similarity criterion has a vast number of applications in many branches of 
computer science, from pattern recognition to textual and multimedia information retrieval. 
We are interested in the rather general case where the similarity criterion defines a metric 
space, instead of the more restricted case of a vector space. Many solutions have been 
proposed in different areas, in many cases without cros ... 

Keywords: Curse of dimensionality, nearest neighbors, similarity searching, vector spaces 



18 Ro uting: N etw ork routing with pat h ve ctor pro toco ls: th e ory and a ppl ica tion s 
Joao Luis Sobrinho 

August 2003 Proceedings of the 2003 conference on Applications, technologies, 
architectures, and protocols for computer communications 

Full text available: ^ pdf(266.53 KB) Additional Information: full citation , abstract, references , index terms 

Path vector protocols are currently in the limelight, mainly because the inter-domain routing 
protocol of the Internet, BGP (Border Gateway Protocol), belongs to this class. In this paper, 
we cast the operation of path vector protocols into a broad algebraic framework and relate 
the convergence of the protocol, and the characteristics of the paths to which it converges, 
with the monotonicity and isotonicity properties of its path compositional operation. Here, 
monotonicity means that the weight ... 

Keywords: BGP, algebra, border gateway protocol, path vector protocols 



19 Face recognition: A literature survey 

W. Zhao, R. Chellappa, P. J. Phillips, A. Rosenfeld 

December 2003 ACM Computing Surveys (CSUR), volume 35 issue 4 

Full text available:^ pdf(4.28 MB) Additional Information: full citation , abstract , references , index terms 
As one of the most successful applications of image analysis and understanding, face 



http://portalbeta.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=15^ 12/30/03 



Results (page 1): metric and vector and signal and learn 



Page 6 of 6 



recognition has recently received significant attention, especially during the past several 
years. At least two reasons account for this trend: the first is the wide range of commercial 
and law enforcement applications, and the second is the availability of feasible technologies 
after 30 years of research. Even though current machine recognition systems have reached 
a certain level of maturity, their success is ... 

Keywords: Face recognition, person identification 



20 Thi nk g loball y, fit locally: uns u pervised learnin g of low dimensional manifolds 
Lawrence K. Saul, Sam T. Roweis 

September 2003 The Journal of Machine Learning Research, volume 4 
Full text available: ^ pdf( 2.91 MB) Additional Information: full citation , abstract 

The problem of dimensionality reduction arises in many fields of information processing, 
including machine learning, data compression, scientific visualization, pattern recognition, 
and neural computation. Here we describe locally linear embedding (LLE), an unsupervised 
learning algorithm that computes low dimensional, neighborhood preserving embeddings of 
high dimensional data. The data, assumed to be sampled from an underlying manifold, are 
mapped into a single global coordinate system of lowe ... 
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NEC Research Institute Technical Report 1995 

Metric Learning via Normal Mixtures 

Peter N. Yianilos 

Abstract: Natural learners rarely have access to perfectly labeled data - motivating the study of 
unsupervised learning in an attempt to assign labels. An alternative viewpoint, which avoids the issue of 
labels entirely, has as the learner's goal the discovery of an effective metric with which similarity 
judgments can be made. We refer to this paradigm as {\em metric learning}. Effective classification, for 
example, then becomes a consequence rather than the direct purpose of learning. 

Consider the following setting: a database made up of exactly one observation of each of many different 
objects. This paper shows that, under admittedly strong assumptions, there exists a natural prescription 
for metric learning in this data starved case. 

Our outlook is stochastic, and the metric we learn is represented by a joint probability density estimated 
from the observed data. We derive a closed-form expression for the value of this density starting from an 
explanation of the data as a Gaussian Mixture. Our framework places two known classification 
techniques of statistical pattern recognition at opposite ends of a spectrum - and describes new 
intermediate possibilities. The notion of a stochastic equivalence predicate is introduced and striking 
differences between its behavior and that of conventional metrics are illuminated. As a result one of the 
basic tenets of nearest-neighbor-based classification is challenged. 

Keywords: Nearest Neighbor Search, Metric Learning, Normal/Gaussian Mixture Densities, 
Unsupervised Learning, Neural Network, Encoder Network. 
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